AITopics | data mixing

Mix and Reason: Reasoning over Semantic Topology with Data Mixing for Domain Generalization

Neural Information Processing SystemsDec-25-2025, 10:30:51 GMT

Domain generalization (DG) enables generalizing a learning machine from multiple seen source domains to an unseen target one. The general objective of DG methods is to learn semantic representations that are independent of domain labels, which is theoretically sound but empirically challenged due to the complex mixture of common and domain-specific factors. Although disentangling the representations into two disjoint parts has been gaining momentum in DG, the strong presumption over the data limits its efficacy in many real-world scenarios. In this paper, we propose Mix and Reason (MiRe), a new DG framework that learns semantic representations via enforcing the structural invariance of semantic topology. MiRe consists of two key components, namely, Category-aware Data Mixing (CDM) and Adaptive Semantic Topology Refinement (ASTR). CDM mixes two images from different domains in virtue of activation maps generated by two complementary classification losses, making the classifier focus on the representations of semantic objects. ASTR introduces relation graphs to represent semantic topology, which is progressively refined via the interactions between local feature aggregation and global cross-domain relational reasoning. Experiments on multiple DG benchmarks validate the effectiveness and robustness of the proposed MiRe.

data mixing, mix and reason, semantic topology, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.83)

Add feedback

Mix and Reason: Reasoning over Semantic Topology with Data Mixing for Domain Generalization Chaoqi Chen 1 Luyao T ang

Neural Information Processing SystemsAug-19-2025, 07:19:37 GMT

Domain generalization (DG) enables generalizing a learning machine from multiple seen source domains to an unseen target one.

artificial intelligence, generalization, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Fujian Province > Xiamen (0.04)

Genre: Research Report (0.46)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Mix and Reason: Reasoning over Semantic Topology with Data Mixing for Domain Generalization

Neural Information Processing SystemsJan-19-2025, 00:34:52 GMT

Domain generalization (DG) enables generalizing a learning machine from multiple seen source domains to an unseen target one. The general objective of DG methods is to learn semantic representations that are independent of domain labels, which is theoretically sound but empirically challenged due to the complex mixture of common and domain-specific factors. Although disentangling the representations into two disjoint parts has been gaining momentum in DG, the strong presumption over the data limits its efficacy in many real-world scenarios. In this paper, we propose Mix and Reason (MiRe), a new DG framework that learns semantic representations via enforcing the structural invariance of semantic topology. MiRe consists of two key components, namely, Category-aware Data Mixing (CDM) and Adaptive Semantic Topology Refinement (ASTR).

data mixing, mix and reason, representation, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback

Data Mixing Made Efficient: A Bivariate Scaling Law for Language Model Pretraining

Ge, Ce, Ma, Zhijian, Chen, Daoyuan, Li, Yaliang, Ding, Bolin

arXiv.org Artificial IntelligenceJul-11-2024

Large language models exhibit exceptional generalization capabilities, primarily attributed to the utilization of diversely sourced data. However, conventional practices in integrating this diverse data heavily rely on heuristic schemes, lacking theoretical guidance. This research tackles these limitations by investigating strategies based on low-cost proxies for data mixtures, with the aim of streamlining data curation to enhance training efficiency. Specifically, we propose a unified scaling law, termed $\textbf{BiMix}$, which accurately models the bivariate scaling behaviors of both data quantity and mixing proportions. We conduct systematic experiments and provide empirical evidence for the predictive power and fundamental principles of $\textbf{BiMix}$. Notably, our findings reveal that entropy-driven training-free data mixtures can achieve comparable or even better performance than more resource-intensive methods. We hope that our quantitative insights can shed light on further judicious research and development in cost-effective language modeling.

bivariate scaling law, data mixing, language model pretraining

arXiv.org Artificial Intelligence

2405.14908

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Mix and Reason: Reasoning over Semantic Topology with Data Mixing for Domain Generalization

Chen, Chaoqi, Tang, Luyao, Liu, Feng, Zhao, Gangming, Huang, Yue, Yu, Yizhou

arXiv.org Artificial IntelligenceOct-14-2022

Domain generalization (DG) enables generalizing a learning machine from multiple seen source domains to an unseen target one. The general objective of DG methods is to learn semantic representations that are independent of domain labels, which is theoretically sound but empirically challenged due to the complex mixture of common and domain-specific factors. Although disentangling the representations into two disjoint parts has been gaining momentum in DG, the strong presumption over the data limits its efficacy in many real-world scenarios. In this paper, we propose Mix and Reason (MiRe), a new DG framework that learns semantic representations via enforcing the structural invariance of semantic topology. MiRe consists of two key components, namely, Category-aware Data Mixing (CDM) and Adaptive Semantic Topology Refinement (ASTR). CDM mixes two images from different domains in virtue of activation maps generated by two complementary classification losses, making the classifier focus on the representations of semantic objects. ASTR introduces relation graphs to represent semantic topology, which is progressively refined via the interactions between local feature aggregation and global cross-domain relational reasoning. Experiments on multiple DG benchmarks validate the effectiveness and robustness of the proposed MiRe.

artificial intelligence, machine learning, object-oriented architecture, (16 more...)

arXiv.org Artificial Intelligence

2210.07571

Country:

Asia > China > Hong Kong (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Fujian Province > Xiamen (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.66)

Add feedback

Collaborating Authors

data mixing

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Mix and Reason: Reasoning over Semantic Topology with Data Mixing for Domain Generalization

Mix and Reason: Reasoning over Semantic Topology with Data Mixing for Domain Generalization Chaoqi Chen 1 Luyao T ang

Mix and Reason: Reasoning over Semantic Topology with Data Mixing for Domain Generalization

Data Mixing Made Efficient: A Bivariate Scaling Law for Language Model Pretraining

Mix and Reason: Reasoning over Semantic Topology with Data Mixing for Domain Generalization